AITopics | Sint Maarten

Collaborating Authors

Sint Maarten

WikiVideo: Article Generation from Multiple Videos

Martin, Alexander, Kriz, Reno, Walden, William Gantt, Sanders, Kate, Recknor, Hannah, Yang, Eugene, Ferraro, Francis, Van Durme, Benjamin

arXiv.org Artificial IntelligenceApr-1-2025

We present the challenging task of automatically creating a high-level Wikipedia-style article that aggregates information from multiple diverse videos about real-world events, such as natural disasters or political elections. Videos are intuitive sources for retrieval-augmented generation (RAG), but most contemporary RAG workflows focus heavily on text and existing methods for video-based summarization focus on low-level scene understanding rather than high-level event semantics. To close this gap, we introduce WikiVideo, a benchmark consisting of expert-written articles and densely annotated videos that provide evidence for articles' claims, facilitating the integration of video into RAG pipelines and enabling the creation of in-depth content that is grounded in multimodal sources. We further propose Collaborative Article Generation (CAG), a novel interactive method for article creation from multiple videos. CAG leverages an iterative interaction between an r1-style reasoning model and a VideoLLM to draw higher level inferences about the target event than is possible with VideoLLMs alone, which fixate on low-level visual features. We benchmark state-of-the-art VideoLLMs and CAG in both oracle retrieval and RAG settings and find that CAG consistently outperforms alternative methods, while suggesting intriguing avenues for future work.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.00939

Country:

Europe > France > Île-de-France > Paris > Paris (0.29)
North America > The Bahamas (0.14)
North America > United States > Georgia (0.14)
(43 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Fire & Emergency Services (1.00)
Government > Voting & Elections (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding

Cho, Sukmin, Choi, Sangjin, Hwang, Taeho, Seo, Jeongyeon, Jeong, Soyeong, Lee, Huije, Song, Hoyun, Park, Jong C., Kwon, Youngjin

arXiv.org Artificial IntelligenceFeb-8-2025

Accelerating inference in Large Language Models (LLMs) is critical for real-time interactions, as they have been widely incorporated into real-world services. Speculative decoding, a fully algorithmic solution, has gained attention for improving inference speed by drafting and verifying tokens, thereby generating multiple tokens in a single forward pass. However, current drafting strategies usually require significant fine-tuning or have inconsistent performance across tasks. To address these challenges, we propose Hierarchy Drafting (HD), a novel lossless drafting approach that organizes various token sources into multiple databases in a hierarchical framework based on temporal locality. In the drafting step, HD sequentially accesses multiple databases to obtain draft tokens from the highest to the lowest locality, ensuring consistent acceleration across diverse tasks and minimizing drafting latency. Our experiments on Spec-Bench using LLMs with 7B and 13B parameters demonstrate that HD outperforms existing database drafting methods, achieving robust inference speedups across model sizes, tasks, and temperatures.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.05609

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
North America > Cuba (0.04)
(19 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Uncovering Regional Defaults from Photorealistic Forests in Text-to-Image Generation with DALL-E 2

Liu, Zilong, Janowicz, Krzysztof, Currier, Kitty, Shi, Meilin

arXiv.org Artificial IntelligenceOct-3-2024

Regional defaults describe the emerging phenomenon that text-to-image (T2I) foundation models used in generative AI are prone to over-proportionally depicting certain geographic regions to the exclusion of others. In this work, we introduce a scalable evaluation for uncovering such regional defaults. The evaluation consists of region hierarchy--based image generation and cross-level similarity comparisons. We carry out an experiment by prompting DALL-E 2, a state-of-the-art T2I generation model capable of generating photorealistic images, to depict a forest. We select forest as an object class that displays regional variation and can be characterized using spatial statistics. For a region in the hierarchy, our experiment reveals the regional defaults implicit in DALL-E 2, along with their scale-dependent nature and spatial relationships. In addition, we discover that the implicit defaults do not necessarily correspond to the most widely forested regions in reality. Our findings underscore a need for further investigation into the geography of T2I generation and other forms of generative AI.

artificial intelligence, machine learning, regional default, (18 more...)

arXiv.org Artificial Intelligence

2410.17255

Country:

South America (0.17)
Europe > Austria > Vienna (0.14)
North America > Central America (0.06)
(16 more...)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

Guo, Tiezheng, Wang, Chen, Liu, Yanyi, Tang, Jiawei, Li, Pan, Xu, Sai, Yang, Qingwen, Gao, Xianlin, Li, Zhi, Wen, Yingyou

arXiv.org Artificial IntelligenceAug-5-2024

However, Large langugae models (LLM) have acquired superior reading when dealing with complex multi-document question answering comprehension and reasoning capabilities by pretraining on (MDQA) tasks, accurately understanding the question's extensive natural langugae data [1, 2]. They have demonstrated constraints and covering all supporting evidence remains an remarkable performance on a variety of tasks and benchmarks, open challenge [10, 11]. This difficulty arises because previous particularly in the realm of question answering (QA) [3, 4]. Researchers research has treated the relationship between each text chunk are expanding the parameter scale of these models to and the target question in isolation. The retrieval models have enable them to retain more knowledge [5]. However, due to the concentrated solely on whether the main topic of each chunk absence of efficient methods to evaluate or edit their internalized aligns with the question [12]. Imperfect preprocessing can lead knowledge [6], knowledge-intensive tasks remain a major to the incorrect truncation of continuous chunks.

information, keyword, node, (14 more...)

arXiv.org Artificial Intelligence

2408.02907

Country:

North America > Sint Maarten > Philipsburg (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)

Add feedback

MIRAI: Evaluating LLM Agents for Event Forecasting

Ye, Chenchen, Hu, Ziniu, Deng, Yihe, Huang, Zijie, Ma, Mingyu Derek, Zhu, Yanqiao, Wang, Wei

arXiv.org Artificial IntelligenceJul-1-2024

Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.

cameocode, isocode, relation, (15 more...)

arXiv.org Artificial Intelligence

2407.01231

Country:

Asia > North Korea (0.14)
Oceania > Australia > Australian Indian Ocean Territories > Territory of Cocos (Keeling) Islands (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(234 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Law (1.00)
Government > Foreign Policy (1.00)
Government > Military (0.93)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

Chu, Zheng, Chen, Jingchang, Chen, Qianglong, Wang, Haotian, Zhu, Kun, Du, Xiyuan, Yu, Weijiang, Liu, Ming, Qin, Bing

arXiv.org Artificial IntelligenceJun-28-2024

Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-source knowledge. To address this, we propose Beam Aggregation Reasoning, BeamAggR, a reasoning framework for knowledge-intensive multi-hop QA. BeamAggR explores and prioritizes promising answers at each hop of question. Concretely, we parse the complex questions into trees, which include atom and composite questions, followed by bottom-up reasoning. For atomic questions, the LLM conducts reasoning on multi-source knowledge to get answer candidates. For composite questions, the LLM combines beam candidates, explores multiple reasoning paths through probabilistic aggregation, and prioritizes the most promising trajectory. Extensive experiments on four open-domain multi-hop reasoning datasets show that our method significantly outperforms SOTA methods by 8.5%. Furthermore, our analysis reveals that BeamAggR elicits better knowledge collaboration and answer aggregation.

knowledge, reasoning, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2406.1982

Country:

Africa > Namibia (0.14)
Asia > Singapore (0.05)
Africa > Rwanda > Kigali > Kigali (0.04)
(22 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.94)
Energy (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Curating Grounded Synthetic Data with Global Perspectives for Equitable AI

Törnquist, Elin, Caulk, Robert Alexander

arXiv.org Artificial IntelligenceJun-18-2024

The development of robust AI models relies heavily on the quality and variety of training data available. In fields where data scarcity is prevalent, synthetic data generation offers a vital solution. In this paper, we introduce a novel approach to creating synthetic datasets, grounded in real-world diversity and enriched through strategic diversification. We synthesize data using a comprehensive collection of news articles spanning 12 languages and originating from 125 countries, to ensure a breadth of linguistic and cultural representations. Through enforced topic diversification, translation, and summarization, the resulting dataset accurately mirrors real-world complexities and addresses the issue of underrepresentation in traditional datasets. This methodology, applied initially to Named Entity Recognition (NER), serves as a model for numerous AI disciplines where data diversification is critical for generalizability. Preliminary results demonstrate substantial improvements in performance on traditional NER benchmarks, by up to 7.3%, highlighting the effectiveness of our synthetic data in mimicking the rich, varied nuances of global data sources. This paper outlines the strategies employed for synthesizing diverse datasets and provides such a curated dataset for NER.

arxiv preprint arxiv, dataset, entity type, (15 more...)

arXiv.org Artificial Intelligence

2406.10258

Country:

Europe > France (0.28)
South America > Venezuela (0.04)
South America > Uruguay (0.04)
(122 more...)

Genre: Research Report (1.00)

Industry:

Government (1.00)
Leisure & Entertainment > Sports > Football (0.68)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

Magomere, Jabez, Ishida, Shu, Afonja, Tejumade, Salama, Aya, Kochin, Daniel, Yuehgoh, Foutse, Hamzaoui, Imane, Sefala, Raesetje, Alaagib, Aisha, Semenova, Elizaveta, Crais, Lauren, Hall, Siobhan Mackenzie

arXiv.org Artificial IntelligenceJun-13-2024

Foundation models are increasingly ubiquitous in our daily lives, used in everyday tasks such as text-image searches, interactions with chatbots, and content generation. As use increases, so does concern over the disparities in performance and fairness of these models for different people in different parts of the world. To assess these growing regional disparities, we present World Wide Dishes, a mixed text and image dataset consisting of 765 dishes, with dish names collected in 131 local languages. World Wide Dishes has been collected purely through human contribution and decentralised means, by creating a website widely distributed through social networks. Using the dataset, we demonstrate a novel means of operationalising capability and representational biases in foundation models such as language models and text-to-image generative models. We enrich these studies with a pilot community review to understand, from a first-person perspective, how these models generate images for people in five African countries and the United States. We find that these models generally do not produce quality text and image outputs of dishes specific to different regions. This is true even for the US, which is typically considered to be more well-resourced in training data - though the generation of US dishes does outperform that of the investigated African countries. The models demonstrate a propensity to produce outputs that are inaccurate as well as culturally misrepresentative, flattening, and insensitive. These failures in capability and representational bias have the potential to further reinforce stereotypes and disproportionately contribute to erasure based on region. The dataset and code are available at https://github.com/oxai/world-wide-dishes/.

dall-e 2, dataset, information, (16 more...)

arXiv.org Artificial Intelligence

2406.09496

Country:

North America > United States (0.88)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Democratic Republic of the Congo (0.14)
(98 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.92)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Jeong, Soyeong, Baek, Jinheon, Cho, Sukmin, Hwang, Sung Ju, Park, Jong C.

arXiv.org Artificial IntelligenceMar-28-2024

Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnecessary computational overhead or fail to adequately address complex multi-step queries; yet, not all user requests fall into only one of the simple or complex categories. In this work, we propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs from the simplest to the most sophisticated ones based on the query complexity. Also, this selection process is operationalized with a classifier, which is a smaller LM trained to predict the complexity level of incoming queries with automatically collected labels, obtained from actual predicted outcomes of models and inherent inductive biases in datasets. This approach offers a balanced strategy, seamlessly adapting between the iterative and single-step retrieval-augmented LLMs, as well as the no-retrieval methods, in response to a range of query complexities. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems, compared to relevant baselines including the adaptive retrieval approaches. Code is available at: https://github.com/starsuzi/Adaptive-RAG.

adaptive-rag, complexity, query, (12 more...)

arXiv.org Artificial Intelligence

2403.14403

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > Canada > Ontario > Toronto (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

A Telerehabilitation System for the Selection, Evaluation and Remote Management of Therapies

Anton, David, Berges, Idoia, Bermúdez, Jesús, Goñi, Alfredo, Illarramendi, Arantza

arXiv.org Artificial IntelligenceJan-16-2024

Telerehabilitation systems that support physical therapy sessions anywhere can help save healthcare costs while also improving the quality of life of the users that need rehabilitation. The main contribution of this paper is to present, as a whole, all the features supported by the innovative Kinect-based Telerehabilitation System (KiReS). In addition to the functionalities provided by current systems, it handles two new ones that could be incorporated into them, in order to give a step forward towards a new generation of telerehabilitation systems. The knowledge extraction functionality handles knowledge about the physical therapy record of patients and treatment protocols described in an ontology, named TRHONT, to select the adequate exercises for the rehabilitation of patients. The teleimmersion functionality provides a convenient, effective and user-friendly experience when performing the telerehabilitation, through a two-way real-time multimedia communication. The ontology contains about 2300 classes and 100 properties, and the system allows a reliable transmission of Kinect video depth, audio and skeleton data, being able to adapt to various network conditions. Moreover, the system has been tested with patients who suffered from shoulder disorders or total hip replacement.

interface, physiotherapist, therapy, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/s18051459

2401.08721

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
(14 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Architecture (0.90)

Add feedback